272 research outputs found
Diffusion-Based Audio Inpainting
Audio inpainting aims to reconstruct missing segments in corrupted
recordings. Previous methods produce plausible reconstructions when the gap
length is shorter than about 100\;ms, but the quality decreases for longer
gaps. This paper explores recent advancements in deep learning and,
particularly, diffusion models, for the task of audio inpainting. The proposed
method uses an unconditionally trained generative model, which can be
conditioned in a zero-shot fashion for audio inpainting, offering high
flexibility to regenerate gaps of arbitrary length. An improved deep neural
network architecture based on the constant-Q transform, which allows the model
to exploit pitch-equivariant symmetries in audio, is also presented. The
performance of the proposed algorithm is evaluated through objective and
subjective metrics for the task of reconstructing short to mid-sized gaps. The
results of a formal listening test show that the proposed method delivers a
comparable performance against state-of-the-art for short gaps, while retaining
a good audio quality and outperforming the baselines for the longest gap
lengths tested, 150\;ms and 200\;ms. This work helps improve the restoration of
sound recordings having fairly long local disturbances or dropouts, which must
be reconstructed.Comment: Submitted for publication to the Journal of Audio Engineering Society
on January 30th, 202
Efficient target-response interpolation for a graphic equalizer
Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, held in Shanghai (China) during 20-25 March 2016.A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer with a high precision requires evaluating a target response that interpolates the magnitude response at several frequency points between the command gains. Good accuracy has been previously achieved by using polynomial interpolation methods such as cubic Hermite or spline interpolation. However, these methods require large computational resources, which is a limitation in real-time applications. This paper proposes an efficient way of computing the target response without sacrificing the approximation accuracy. This new approach called Linear Interpolation with Constant Segments (LICS) reduces the computing time of the target response by 55% and has an intrinsic parallel structure. Performance of the LICS method is assessed on an ARM Cortex-A7 core, which is commonly used in embedded systems.This work was conducted in spring 2015 when the first author was a
visiting postdoctoral researcher at Aalto University. This research has been
partly funded by the TIN2014-53495-R and TIN2011-23283 projects of the
Ministerio de EconomĂa y Competitividad and FEDER
EQ
Ekvalisointia kĂ€ytetÀÀn akustiikassa ja audiotekniikassa laajasti esimerkiksi ÀÀnentoistojĂ€rjestelmĂ€n taajuusvasteen korjaamiseen. Ekvalisaattorien (EQ) suunnittelu on kehittynyt paljon viime vuosina. TĂ€ssĂ€ artikkelissa keskitymme graaïŹsiin ekvalisaattoreihin, joiden suunnittelu on haastavaa. Esittelemme kaksi periaatetta ekvalisaattorin toteuttamiseen, perĂ€kkĂ€is- ja rinnaisrakenteen. KehittĂ€mĂ€mme uusimmat graaïŹset ekvalisaattorit tĂ€yttĂ€vĂ€t kriittisen hiïŹ-vaatimuksen, jonka mukaan taajuusvasteen tulee vastata asetuksia yhden desibelin tarkkuudella. GraaïŹsen ekvalisaattorin perĂ€kkĂ€israkenteessa se onnistuu valitsemalla tarkoituksenmukainen parametrinen suodin jokaiselle kaistalle, sÀÀtĂ€mĂ€llĂ€ niiden kaistanleveys siten, ettĂ€ vahvistuksen vaikutus viereisille kaistoille tunnetaan, ja ratkaisemalla kaistasuotimien vahvistukset pienimmĂ€n neliösumman menetelmĂ€llĂ€. Tarkka ja tehokas rinnakkainen graaïŹnen ekvalisaattori saadaan muuntamalla perĂ€kkĂ€israkenne viivĂ€stettyyn rinnakkaismuotoon, joka on uutuus tĂ€llĂ€ alalla.Koska nĂ€illĂ€ menetelmillĂ€ suunniteltujen oktaavi- ja terssiekvalisaattorien parametrien pĂ€ivitys vaatii paljon laskentaa, olemme korvanneet vahvistusten optimoinnin keinotekoisen hermoverkon avulla. KehittĂ€miemme menetelmien ansiosta graaïŹsen oktaavi- ja terssiekvalisaattorin suunnitteluongelma on nyt kĂ€ytĂ€nnössĂ€ ratkaistu.Non peer reviewe
Solving Audio Inverse Problems with a Diffusion Model
This paper presents CQT-Diff, a data-driven generative audio model that can,
once trained, be used for solving various different audio inverse problems in a
problem-agnostic setting. CQT-Diff is a neural diffusion model with an
architecture that is carefully constructed to exploit pitch-equivariant
symmetries in music. This is achieved by preconditioning the model with an
invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency
axis represents pitch equivariance as translation equivariance. The proposed
method is evaluated with objective and subjective metrics in three different
and varied tasks: audio bandwidth extension, inpainting, and declipping. The
results show that CQT-Diff outperforms the compared baselines and ablations in
audio bandwidth extension and, without retraining, delivers competitive
performance against modern baselines in audio inpainting and declipping. This
work represents the first diffusion-based general framework for solving inverse
problems in audio processing.Comment: Submitted to ICASSP 202
Real-time emulation of the Clavinet
none3siopenLeonardo Gabrielli, Vesa VÀlimÀki, Stefan BilbaoGabrielli, Leonardo; VÀlimÀki, Vesa; Bilbao, Stefa
Zero-Shot Blind Audio Bandwidth Extension
Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to non-blind
filter-informed methods when tested with synthetic data. Moreover, BABE
exhibits robust generalization capabilities when enhancing real historical
recordings, effectively reconstructing the missing high-frequency content while
maintaining coherence with the original recording. Subjective preference tests
confirm that BABE significantly improves the audio quality of historical music
recordings. Examples of historical recordings restored with the proposed method
are available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Adversarial Guitar Amplifier Modelling With Unpaired Data
We propose an audio effects processing framework that learns to emulate a
target electric guitar tone from a recording. We train a deep neural network
using an adversarial approach, with the goal of transforming the timbre of a
guitar, into the timbre of another guitar after audio effects processing has
been applied, for example, by a guitar amplifier. The model training requires
no paired data, and the resulting model emulates the target timbre well whilst
being capable of real-time processing on a modern personal computer. To verify
our approach we present two experiments, one which carries out unpaired
training using paired data, allowing us to monitor training via objective
metrics, and another that uses fully unpaired data, corresponding to a
realistic scenario where a user wants to emulate a guitar timbre only using
audio data from a recording. Our listening test results confirm that the models
are perceptually convincing
Five Variations on a Feedback Theme
This is a study on a set of feedback amplitude modulation oscillator
equations. It is based on a very simple and inexpensive algorithm
which is capable of generating a complex spectrum from
a sinusoidal input. We examine the original and five variations
on it, discussing the details of each synthesis method. These include
the addition of extra delay terms, waveshaping of the feedback
signal, further heterodyning and increasing the loop delay.
In complement, we provide a software implementation of these
algorithms as a practical example of their application and as
demonstration of their potential for synthesis instrument design
- âŠ